Filtering High-Dimensional Methylation Marks With Extremely Small Sample Size: An Application to Gastric Cancer Data

نویسندگان

چکیده

DNA methylations in critical regions are highly involved cancer pathogenesis and drug response. However, to identify causal out of a large number potential polymorphic methylation sites is challenging. This high-dimensional data brings two obstacles: first, many established statistical models not scalable so features; second, multiple-test overfitting become serious. To this end, method quickly filter candidate narrow down targets for downstream analyses urgently needed. BACkPAy pre-screening Bayesian approach detect biological meaningful patterns differential levels with small sample size. prioritizes potentially important biomarkers by the false discovery rate (FDR) approach. It filters non-informative (i.e., non-differential) flat pattern across experimental conditions. In work, we applied genome-wide dataset three tissue types each type contains gastric samples. We also LIMMA (Linear Models Microarray RNA-Seq Data) compare its results what achieved BACkPAy. Then, Cox proportional hazards regression were utilized visualize prognostics significant markers The Cancer Genome Atlas (TCGA) survival analysis. Using BACkPAy, identified eight patterns/groups probes from dataset. TCGA data, five prognostic genes predictive progression cancer) that contain some probes, whereas no was using Benjamin-Hochberg FDR LIMMA. showed importance analysis extremely size cancer. revealed RDH13, CLDN11, TMTC1, UCHL1, FOXP2 can serve as treatment promoter level these serum could have diagnostic functions patients.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach

Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...

متن کامل

An Efficient Dimensionality Reduction Approach for Small-sample Size and High-dimensional Data Modeling

As for massive multidimensional data are being generated in a wide range of emerging applications, this paper introduces two new methods of dimension reduction to conduct small-sample size and high-dimensional data processing and modeling. Through combining the support vector machine (SVM) and recursive feature elimination (RFE), SVM-RFE algorithm is proposed to select features, and further, ad...

متن کامل

T3-Plot for Testing Spherical Symmetry for High-Dimensional Data with a Small Sample Size

High-dimensional data with a small sample size, such as microarray data and image data, are commonly encountered in some practical problems for which many variables have to be measured but it is too costly or time consuming to repeat the measurements for many times. Analysis of this kind of data poses a great challenge for statisticians. In this paper, we develop a new graphical method for test...

متن کامل

Shrinkage-based diagonal Hotelling's tests for high-dimensional small sample size data

High-throughput expression profiling techniques bring novel tools and also statistical challenges to genetic research. In addition to detecting differentially expressed genes, testing the significance of gene sets or pathway analysis has been recognized as an equally important problem. Owing to the ‘‘large p small n’’ paradigm, the traditional Hotelling’s T 2 test suffers from the singularity p...

متن کامل

Multi-dimensional data construction method with its application to learning from small-sample-sets

Insufficient training data is one of the major problems in neural network learning, because it leads to poor learning performance. In order to enhance an intelligent learning process, it is necessary to exploit the features of the problem from the available information even with limited scale. Due to the shortcomings of the existing methods for data generation; and also in general, a problem is...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Frontiers in Genetics

سال: 2021

ISSN: ['1664-8021']

DOI: https://doi.org/10.3389/fgene.2021.705708